13 research outputs found

    Evaluation of methods for predicting the topology of β-barrel outer membrane proteins and a consensus prediction method

    Get PDF
    BACKGROUND: Prediction of the transmembrane strands and topology of β-barrel outer membrane proteins is of interest in current bioinformatics research. Several methods have been applied so far for this task, utilizing different algorithmic techniques and a number of freely available predictors exist. The methods can be grossly divided to those based on Hidden Markov Models (HMMs), on Neural Networks (NNs) and on Support Vector Machines (SVMs). In this work, we compare the different available methods for topology prediction of β-barrel outer membrane proteins. We evaluate their performance on a non-redundant dataset of 20 β-barrel outer membrane proteins of gram-negative bacteria, with structures known at atomic resolution. Also, we describe, for the first time, an effective way to combine the individual predictors, at will, to a single consensus prediction method. RESULTS: We assess the statistical significance of the performance of each prediction scheme and conclude that Hidden Markov Model based methods, HMM-B2TMR, ProfTMB and PRED-TMBB, are currently the best predictors, according to either the per-residue accuracy, the segments overlap measure (SOV) or the total number of proteins with correctly predicted topologies in the test set. Furthermore, we show that the available predictors perform better when only transmembrane β-barrel domains are used for prediction, rather than the precursor full-length sequences, even though the HMM-based predictors are not influenced significantly. The consensus prediction method performs significantly better than each individual available predictor, since it increases the accuracy up to 4% regarding SOV and up to 15% in correctly predicted topologies. CONCLUSIONS: The consensus prediction method described in this work, optimizes the predicted topology with a dynamic programming algorithm and is implemented in a web-based application freely available to non-commercial users at

    Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins

    Get PDF
    BACKGROUND: Hidden Markov Models (HMMs) have been extensively used in computational molecular biology, for modelling protein and nucleic acid sequences. In many applications, such as transmembrane protein topology prediction, the incorporation of limited amount of information regarding the topology, arising from biochemical experiments, has been proved a very useful strategy that increased remarkably the performance of even the top-scoring methods. However, no clear and formal explanation of the algorithms that retains the probabilistic interpretation of the models has been presented so far in the literature. RESULTS: We present here, a simple method that allows incorporation of prior topological information concerning the sequences at hand, while at the same time the HMMs retain their full probabilistic interpretation in terms of conditional probabilities. We present modifications to the standard Forward and Backward algorithms of HMMs and we also show explicitly, how reliable predictions may arise by these modifications, using all the algorithms currently available for decoding HMMs. A similar procedure may be used in the training procedure, aiming at optimizing the labels of the HMM's classes, especially in cases such as transmembrane proteins where the labels of the membrane-spanning segments are inherently misplaced. We present an application of this approach developing a method to predict the transmembrane regions of alpha-helical membrane proteins, trained on crystallographically solved data. We show that this method compares well against already established algorithms presented in the literature, and it is extremely useful in practical applications. CONCLUSION: The algorithms presented here, are easily implemented in any kind of a Hidden Markov Model, whereas the prediction method (HMM-TM) is freely available for academic users at , offering the most advanced decoding options currently available

    A Hidden Markov Model method, capable of predicting and discriminating β-barrel outer membrane proteins

    Get PDF
    BACKGROUND: Integral membrane proteins constitute about 20–30% of all proteins in the fully sequenced genomes. They come in two structural classes, the α-helical and the β-barrel membrane proteins, demonstrating different physicochemical characteristics, structure and localization. While transmembrane segment prediction for the α-helical integral membrane proteins appears to be an easy task nowadays, the same is much more difficult for the β-barrel membrane proteins. We developed a method, based on a Hidden Markov Model, capable of predicting the transmembrane β-strands of the outer membrane proteins of gram-negative bacteria, and discriminating those from water-soluble proteins in large datasets. The model is trained in a discriminative manner, aiming at maximizing the probability of correct predictions rather than the likelihood of the sequences. RESULTS: The training has been performed on a non-redundant database of 14 outer membrane proteins with structures known at atomic resolution; it has been tested with a jacknife procedure, yielding a per residue accuracy of 84.2% and a correlation coefficient of 0.72, whereas for the self-consistency test the per residue accuracy was 88.1% and the correlation coefficient 0.824. The total number of correctly predicted topologies is 10 out of 14 in the self-consistency test, and 9 out of 14 in the jacknife. Furthermore, the model is capable of discriminating outer membrane from water-soluble proteins in large-scale applications, with a success rate of 88.8% and 89.2% for the correct classification of outer membrane and water-soluble proteins respectively, the highest rates obtained in the literature. That test has been performed independently on a set of known outer membrane proteins with low sequence identity with each other and also with the proteins of the training set. CONCLUSION: Based on the above, we developed a strategy, that enabled us to screen the entire proteome of E. coli for outer membrane proteins. The results were satisfactory, thus the method presented here appears to be suitable for screening entire proteomes for the discovery of novel outer membrane proteins. A web interface available for non-commercial users is located at: , and it is the only freely available HMM-based predictor for β-barrel outer membrane protein topology

    Meta-Analysis of Family-Based and Case-Control Genetic Association Studies that Use the Same Cases

    No full text
    In many cases in genetic epidemiology, the investigators in an effort to control for different sources of confounding and simultaneously to increase the power perform a family-based and a population-based case-control study within the same population, using the same or largely overlapping, set of cases. Various methods have been proposed for performing a combined analysis, but they all require access to individual data that are difficult to gather in a meta-analysis. Here, we propose a simple and efficient summary-based method for performing the meta-analysis. The key point, contrary to the methods presented earlier that need individual data, is the calculation of the covariance between the study estimates (log-Odds Ratios), using only data derived from the literature in the form of a 2x2 contingency table. Afterwards, the studies can easily be combined either in a two-step procedure using traditional methods for univariate meta-analysis or in a single-step approach using hierarchical models. In any case, the meta-analysis can be performed using standard software and because of the increased sample size the statistical power of the meta-analysis is increased whereas the procedure allows performing several diagnostics (publication bias, cumulative meta-analysis, sensitivity analysis). The method is evaluated on a dataset of 356 Single Nucleotide polymorphisms (SNPs) which were evaluated for their potential association with Respiratory Syncytial Virus Bronchiolitis (RSV) and subsequently is applied in a meta-analysis concerning the association of the 10-Repeat Allele of a VNTR Polymorphism in the 3’-UTR of Dopamine Transporter Gene with Attention Deficit Hyperactivity Disorder (ADHD), as well as in a genome-wide association study for Multiple Sclerosis. Implementation of the method is straightforward and in the Appendix, a Stata program is given for implementing the methods presented here.

    Prediction of cell wall sorting signals in gram-positive bacteria with a hidden markov model: Application to complete genomes

    No full text
    Surface proteins in Gram-positive bacteria are frequently implicated in virulence. We have focused on a group of extracellular cell wall-attached proteins (CWPs), containing an LPXTG motif for cleavage and covalent coupling to peptidoglycan by sortase enzymes. A hidden Markov model (HMM) approach for predicting the LPXTG-anchored cell wall proteins of Gram-positive bacteria was developed and compared against existing methods. The HMM model is parsimonious in terms of the number of freely estimated parameters, and it has proved to be very sensitive and specific in a training set of 55 experimentally verified LPXTG-anchored cell wall proteins as well as in reliable data sets of globular and transmembrane proteins. In order to identify such proteins in Gram-positive bacteria, a comprehensive analysis of 94 completely sequenced genomes has been performed. We identified, in total, 860 LPXTG-anchored cell wall proteins, a number that is significantly higher compared to those obtained by other available methods. Of these proteins, 237 are hypothetical proteins according to the annotation of SwissProt, and 88 had no homologs in the SwissProt database - this might be evidence that they are members of newly identified families of CWPs. The prediction tool, the database with the proteins identified in the genomes, and supplementary material are available online at http://bioinformatics.biol.uoa.gr/CW-PRED/. © 2008 Imperial College Press
    corecore